Trees and Nets
Kerry Back
BUSI 520, Fall 2022
JGSB, Rice University
Decision tree
- Split sample sucessively into smaller subsamples by answering “yes-no” questions.
- Each question is based on a single variable: is it above a threshold?
- Split a specified number of times (depth). Final subsamples are called leaves.
- The prediction for each observation is the mean of the training observations that end up in the same leaf.
- Variable to split on and threshold are chosen each time to minimize the SSE after the split.
Illustration
Random forest
- Multiple trees fit to random data
- Data for each tree is a bootstrapped sample:
- random selection of rows (with replacement)
- same size as original sample
- Prediction is average of predictions of the trees
- Hyperparameters = number of trees and depth of trees
Gradient boosting
- Multiple trees
- First tree fit to data
- Second tree fit to errors from first tree
- Third tree fit to errors from second tree, …
- Prediction is sum of predictions
- Hyperparameters = number of trees and depth of trees
Multi-layer perceptrons
- A multi-layer perceptron (MLP) consists of “neurons” arranged in layers.
- A neuron is a mathematical function. It takes inputs \(x_1, \ldots, x_n\), calculates a function \(y=f(x_1, \ldots, x_n)\) and passes \(y\) to the neurons in the next level.
- The inputs in the first layer are the predictors.
- The inputs in successive layers are the calculations from the prior level.
- The last layer is a single neuron that produces the output.
Illustration
- 4 independent variables (features)
- 5 functions of the 4 features are calculated in the “hidden layer.”
- The output is a function of the 5 numbers calculated in the hidden layer.
Rectified linear units
- The usual function for the neurons (except in the last layer) is \[ y = \max(0,b+w_1x_1 + \cdots + w_nx_n)\] Parameters \(b\) (called bias) and \(w_1, \ldots w_n\) (called weights) are different for different neurons.
- This function is called a rectified linear unit (RLU). It’s like an option payoff.
- Last layer uses a linear function \[ y = b+w_1x_1 + \cdots + w_nx_n\]
Analogy to neurons firing
If \(w_i>0\) and \(b<0\), then \(y>0\) only when \(x_i\) are large enough.
A neuron fires when it is sufficiently stimulated by signals from other neurons (in prior layer).
Deep learning
- Deep learning means a neural network with many layers.
- Deep learning is behind facial recognition, self-driving cars, …
- Need specialized library, probably TensorFlow (from Google) or PyTorch (from Facebook)
- And probably need a graphical processing unit (GPU) – i.e., run on a video card
- Can often start from a pretrained model